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Abstract 


We  prove  convergence  of  a  distributed  gradient 
projection  method  for  optimal  routing  in  a  data  communi¬ 
cation  network.  The  analysis  is  carried  out  without  any 
synchronization  assumptions  and  takes  into  account  the 
possibility  of  transients  caused  by  updates  in  the 
routing  strategy  being  used. 

1 .  INTRQDUCTTON 

The  most  popular  formulation  of  the  optimal  distri¬ 
buted  routing  problem  in  a  data  network  is  based  on  a 
multicommodity  flow  optimization  whereby  a  separable 
objective  function  of  the  form 

(i,j) 

is  minimized  with  respect  to  the  flow  variables  , 
subject  to  multicommodity  flow  constraints  ((11,  [2], 

[3],  [12]).  Here  (i,j)  denotes  a  generic  directed 

network  link,  and  is  a  strictly  convex  differenti¬ 

able,  increasing  function  of  which  represents  in 
turn  the  total  traffic  arrival  rate  on  link  (i,j) 
measured  for  example  in  packets  or  bits  per  second. 

We  want  to  find  a  routing  that  minimizes-  this 
objective.  By  a  routing  we  mean  a  set  of  active  paths 
for  each  origin-destination  (OD)  pair  (set  of  paths 
carrying  some  traffic  of  that  OD  pair) ,  together  with 
the  fraction  of  total  traffic  of  the  OD  pair  routed 
along  each  active  path. 

A  typical  example  of  a  distributed  routing  algorithm 
operates  roughly  as  follows: 

The  total  link  arrival  rates  are  measured 

by  time  averaging  over  a  period  of  time ,  and  are 
communicated  to  all  network  nodes.  Upon  reception  of 
these  measured  rates  each  node  updates  the  part  of  the 
routing  dealing  with  traffic  originating  at  that  node. 
The  updating  method  is  based  on  some  rule,  e.g.  a 
shortest  path  method  [2] ,  [4] ,  or  an  iterative 
optimization  algorithm  [1],  [5],  [6]. 

There  are  a  number  of  variations  of  this  idea  -  for 
example  some  relevant  function  of  F^ll  may  be  measured  in 
place  of  Fi3 ,  or  a  sanewhat  different  type  of  routing 
policy  may  be  used,  but  these  will  not  concern  us  for 
the  time  being.  The  preceding  algorithm  is  used  in  this 
paper  as  an  example  which  is  interesting  in  its  own 
right  but  also  involves  ideas  that  are  common  to  other 
types  of  routing  algorithms. 

Most  of  the  existing  analysis  of  distribution 
routing  algorithms  such  as  the  one  above  is  predicated 
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on  several  assumptions  that  are  to  some  extent  violated 
in  practice.  These  are: 

a)  The  quasistatic  assumption,  i.e.  the  external  traffic 
arrival  rate  for  each  OD  pair  is  constant  over  time. 

This  assumption  is  approximately  valid  when  there  is  a 
large  number  of  user-pair  conversations  associated  with 
each  OD  pair ,  and  each  of  these  conversations  has  an 
arrival  rate  that  is  small  relative  to  the  total  arrival 
rate  for  the  OD  pair  (i.e.  a  "many  small  users"  as¬ 
sumption)  .  An  asymptotic  analysis  of  the  effect  of 
violation  of  this  assumption  on  the  stationary  character 
of  the  external  traffic  arrival  rates  is  given  in  [7]. 

b)  The  fast  setting  time  assumption,  i.e.  transients 
in  the  flows  F^3  due  to  changes  in  routing  are  negligi¬ 
ble.  In, other  words  once  the  routing  is  updated,  the 
flows  F^3  settle  to  their  new  values  within  time  which 
is  very  small  relative  to  the  time  between  routing 
updates.  This  assumption  is  typically  valid  in  datagram 
networks  but  less  so  in  virtual  circuit  networks  where, 
existing  virtual  circuits  may  not  be  rerouted  after  a 
routing  update.  When  this  assumption  is  violated,  link 
flow  measurements  F^3  reflect  a  dependence  not  just  on 
the  current  routing  but  also  on  possibly  several  past 
routings.  A  seemingly  good  model  is  to  represent  each 
Fi3  as  a  convex  combination  of  the  rates  of  arrival 

at  (i,j)  corresponding  to  two  or  more  past  routing 
update  s . 

c)  The  synchronous  update  assumption,  i.e.  all  link 
rates  Fil  are  measured  simultaneously,  and  are  received 
simultaneously  at  all  network  nodes  who  in  turn 
simultaneously  carry  out  a  routing  update.  However, 
there  may  be  technical  reasons  (such  as  software  com¬ 
plexity)  that  argue  against  enforcing  a  synchronous  up¬ 
date  protocol.  For  example  the  distributed  routing 
algorithm  of  the  ARPANET  [4]  is  not  operated  synchro¬ 
nously. 

In  this  paper" we  show  that  projection  methods ,  one 
of  the  most  interesting  class  of  algorithms  for  dis¬ 
tributed  optimal  routing,  are  valid  even  if  the  settling 
time  and  synchronous  update  assumption  are  violated  to 
a  considerable  extent.  Even  though  we  retain  the 
quasistatic  assumption  in  our  analysis  we  conjecture 
that  the  result  of  this  paper  can  be  generalized  along 
the  lines  of  another  related  study  [7]  whereby  it  is 
shown  that  a  routing  algorithm  based  on  a  shortest  path 
rule  converges  to  a  neighborhood  of  the  optimum.  The 
size  of  this  neighborhood  depends  on  the  extent  of 
violation  of  the  quasistatic  assumption.  A  similar 
deviation  from  optimality  can  be  caused  by  errors  in 
the  measurement  of  pij .  In  our  analysis  these  errors 
are  neglected. 

In  the  next  section  we  provide  some  background  on 
distributed  asynchronous  algorithms  and  discuss  the 
relation  of  the  result  of  the  present  paper  with  earlier 
analyses.  In  section  3  we  formulate  our  class  of 
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distributed  asynchronous  routing  algorithms,  while 
Section  4  provides  convergence  analysis. 

II.  ASYNCHRONOUS  OPTIMIZATION  ALGORITHMS 

We  provide  here  a  brief  discussion  of  the  currently 
available  theory  and  tools  of  analysis  of  asynchronous 
distributed  algorithms.  In  a  typical  such  algorithm 
(aimed  at  solving  an  optimization  problem)  each  proces¬ 
sor  i  has  in  its  memory  a  vector  which  may  be  inter¬ 
preted  as  an  estimate  of  an  optimal  solution.  Each 
processor  obtains  measurements,  performs  computations 
and  updates  some  of  the  components  of  its  vector. 
Concerning  the  other  components,  it  relies  entirely  on 
messages  received  from  other  processors.  We  are  mainly 
interested  in  the  case  where  minimal  assumptions  are 
placed  on  the  orderliness  of  message  exchanges. 

There  are  two  distinct  approaches  for  analyzing 
algorithmic  convergence.  The  first  approach  is 
essentially  a  generalization  of  the  Lyapunov  function 
method  for  proving  convergence  of  centralized  iterative 
processes.  The  idea  here  is  that,  no  matter  what  the 
precise  sequence  of  message  exchanges  is,  each  update  by 
any  processor  brings  its  vector  x^  closer  to_  the  opti¬ 
mum  in  some  sense.  This  approach  applies  primarily  to 
problems  involving  monotone  or  contraction  mappings 
with  respect  to  a  "sup"-norm  (e.g.  a  distributed 
shortest  path  algorithm)  [S', 9]  ;  it  is  only  required 
that  each  processor  communicates  to  every  other  proces¬ 
sor  an  infinite  number  of  times. 

The  second  approach  is  based  on  the  idea  that  if 
the  processors  communicate  fast  enough  relative  to  the 
speed  of  convergence  of  the  computation,  then  the 
evolution  of  their  solution  estimates  x^  may  be  (up  to 
first  order  in  the  step-size  used)  the  same  as  if  all 
processors  were  communicating  to  each  other  at  each 
time  instance  [10,11].  The  latter  case  is,  however, 
mathematically  equivalent  to  a  centralized  (synchronous) 
algorithm  for  which  there  is  an  abundance  of  techniques 
and  results.  Notice  that  in  this  approach,  slightly 
stronger  assumptions  are  placed  on  the  nature  of  the 
communication  process  than  in  the  first  one.  This  is 
compensated  by  the  fact  that  the  corresponding  method 
of  analysis  applies  to  broader  classes  of  algorithms. 


paths  for  carrying  the  flow  from  i  to  j . )  For  each  OD 
pair  w  =  (i,j)  ,  let  r^  be  the  total  arrival  rate  (at 
node  i)  of  traffic  that  has  to  be  sent  to  node  j 
(measured,  for  example,  in  pac)cets  or  bits  per  second). 
For  each  path  p  e  P^,  we  denote  by  x^,^p  the  amount  of 
flow  which  is  routed  through  path  p.  Naturally,  we  have 
the  constraints 

X  >0,  Vp  e  P„,  Vw,  (3.1) 

w,p  —  w'  ' 

Z  X  =  r  ,  Vw.  (3.2) 

^  w,p  w 

T^crp  * 


Let  us  define  a  vector  x  with  components  x  ,  p6P  . 

w  w,p  w 

Constraints  (3.1)  ,  (3.2)  may  be  written  compactly  as 
X  6  G  ,  where  G  is  a  simplex  (in  particular,  G  is 
compact  and  convex)  .  ” 


Suppose  that  there  is  a  total  of  M  OD  pairs  and  let 
us  index  them  so  that  the  variable  w  takes  values  in 
{i,...,m}.  Then,  the  totality  of  flows  tlifough  the 
network  may  be  described  by  a  vector  x  =  (x^,...,x^). 
Naturally,  x  is  svibject  to  the  constraint 
X  e  G^x.  ..xG[^  =  G. 


For  any  link  (i,j)  in  the  network,  let  denote 
the  corresponding  traffic  arrival  rate  at  that  link. 
Clearly, 


M 

I  I 

w=i  pep 


w  ,p 


(i,  j)ep 


(3.3) 


A  cost  function,  corresponding  to  some  measure  of  con¬ 
gestion  through  the  network,  is  introduced.  We  assume 
the  separable  form 


D=  Z  D^^(F^^). 

(i,j)eE 


(3.4) 


We  assume  that  for  each  link  (i,j)  6  E,  D^^  is  convex, 
continuously  differentiable  and  has  a  Lipschitz 
continuous  derivative. 


Unfortunately,  the  results  available  cannot  be 
directly  applied  to  the  routing  problem  studied  in  this 
paper  and  a  new  proof  is  required.  A  main  reason  is 
that  earlier  results  concern  algorithms  for  uncon¬ 
strained  optimization.  In  the  routing  problem,  the 
non-negativity  and  the  conservation  of  flow  introduce 
inequality  and  equality  constraints.  While  equality 
constraints  could  be  taken  care  by  eliminating  some  of 
the  variables,  inequality  constraints  must  be  explicit¬ 
ly  taken  into  account.  Another  difference  arises  be¬ 
cause  ,  in  the  routing  algorithm ,  optimization  is  carried 
out  with  respect  to  path  flow  variables,  whereas  the 
messages  being  broadcast  contain  estimates  of  the  link 
flows  (see  next  section) .  In  earlier  results  the 
variables  being  communicated  were  assumed  to  be  the 
same  as  the  variables  being  optimized. 

III.  THE  ROUTING  MODEL 

We  present  here  our  basic  assumptions ,  our  nota¬ 
tion  and  a  simple  model  by  which  the  nodes  in  a  com¬ 
munication  network  may  adjust  the  routing  of  the  flow 
through  that  network. 

We  are  given  a- network  described  by  a  directed  ■ 
graph  G  =  (V,E) .  (V  is  the  set  of  nodes,  E  the  set  of 
directed  links.  For  each  pair  w  =  (i,j)  of  distinct 
nodes  i  and  j  (also  called  an  origin-destination,  or 
OD,  pair)  we  introduce  P^,  a  set  of  directed  paths  from 
i  to  j ,  containing  no  loops .  (These  are  the  candidate 


We  are  interested  in  the  case  where  the  nodes  in 
the  network  adjust  the  path  routing  variables  x  so 
as  to  minimize  (3.4).  Since  a  set  of  path  flow'*^'^ 
variables  {x  :pep  ,  we{l,...,M}}  determines  uniquely 

W  rp  "  .  , 

the  link  flow  variables  F^!!  (through  (3.3)),  it  is 
more  convenient  to  express  the  cost  function  in  terms 
of  the  path  flow  variables.  We  are  thus  led  to  the 
cost  function 

D(x)  =  1  D^^(x),  (3.5) 

(i,j)eE 

where 

D^^  (x)  =  (<e^^  ,x>)  (3.6) 

and  e^^  is  a  vector  with  entries  in  {o,l},  determined 
by  (3.3).  Clearly,  D^^  inherits  the  convexity  and 
smoothness  properties  of  D^3 . 

Let  us  '  now  consider  the  situation  where  the  flows 
change  slowly  with  time,  due  to  re-routing  decisions 
made  by  the  nodes  in  the  network.  Accordingly,  the 
flows  at  time  n  are  described  by  a  vector  x (n)  = 

(xj^  (n)  , . . .  ,X}^(n) )  S  G.  Let  us  assume  that  the  routing 
decisions  for  the  flow  corresponding  to  a  particular 
OD  pair  w  =  (i,j)  are  made  by  the  origin  node  i.  In 
an  ideal  situation,  node  i  would  have  access  to  the 
exact  value  of  x(n)  and  perform  the  update 


(x(n))] 


(3.7) 


X  (n+1)  =  (x  (n)  -  YV>  ^ — 
w  w  '  w  dx 

w 

(Here  y  is  a  positive  scalar  step-size ,  y  a  positive  ’ 
scaling  constant  and  [•]■*■  denotes  the  pro:]ection  on 
G  with  respect  to  the  Euclidean  norm.)  In  a  practical 
situation,  however,  (3.7)  is  bound  to  be  unrealistic 
for  several  reasons: 


We  now  describe  the  process  by  which  X  (n)  is 
formed. 

For  each  link  (i,j)  ,  node  i  estimates  from  time  to 
time  the  amount  of  traffic  through  that  link.  Practi¬ 
cally,  these  estimates  do  not  correspond  to  instantane¬ 
ous  measurements  but  to  an  average  of  a  set  of  measure¬ 
ments  obtained  over  some  period  of  time.  Accordingly, 
at  each  time  n,  node  i  has  available  an  estimate 


(i)  It  assumes  perfect  synchronization  of  all  origin 
nodes. 

(ii)  It  assumes  that  x(n)  (or,  equivalently,  the  link 
flows  F^3 (n)  at  time  n)  may  be  measured  exactly  at  time 


(iii)  Even  if  the  origin  node  i  is  able  to  compute 
X  (n+1)  exactly  through  (3.7)  ,  the  actual  flows  through 
tKe  network,  at  time  n+1,  will  be  different  from  the 
computed  ones,  unless  the  settling  time  is  negligible. 
The  above  necessitate  the  development  of  a  more 
realistic  model,  which  is  done  below. 


First,  because  of  remark  (iii)  we  will  differentiate 
between  the  actual  flows  through  the  network  (denoted 
by  x(n)  ,  X  (n)  ,  etc.)  and  the  desired  flows,  as  de¬ 
termined  by  the  computations  of  some  node:  the  latter 
will  be  denoted  by  x(n)  and'x  (n) .  The  routing  de¬ 
cisions  of  some  node  at  time  n  are  determined  by  the 
desired  flows  x  (n) .  However,  due  to  transients, 
each  component  x,,  _  (n)  o^  the  actual  flow  x  (n)  will 
have  some  value  between  x„  p^^^  „(n-l)-  Simi¬ 

larly,  x_  „(n-l)  will  be 

w  ,p 


^convex  combiSation  of 


and 


(n-2) ,  Repeating  this  procedure,  we 


X  (n-l) 
w,p 

conclude  that  x  (n) 

_  w,p  — 

X  CO)  ,  X  CD, —  ,x  .  _ _ _ _ 

w,p  w,p  w.p 

X  CO)  should  have  negligible  influence  on  x  Cn) 

Yi5  i--  . _ .  ^ _ _ 


is  in  the  convex  hull  of 

Cn) .  For  n  large  enough, 

and 

w  ,p 

will  be  ignored  for  convenience.  We  may  thus  conclude 
that  there  exist  nonnegative  coefficients  _(n;k) 
such  that 


w,p 


Z  a 


k=«l 


w,p 


[n;k)  =  1,  Vn,w,peP 


C3.8) 


n  __ 

X  (n)  =  Z  a  Cn;k)x  (k)  ,  Vn/W,p€P  .  C3.9) 

w,p  w,p  w,p  ^  w 


.  .  Xi  .  .  •  . 

P^^(n)  =  Z  c^^  (n;m)F^^  (m)  (3.12) 

m=n-Q 

Here,  c^^  (n;m)  are  nonnegative  scalars  summing  to  one 
(for  fixed  n) ,  and  Q  is  a  botmd  on  the  time  over  which 
measurements  are  averaged  plus  the  time  between  the 
computation  of  consecutive  estimates  of  the  flow.  These 
estimates  are  broadcast  from  time  to  time  (asynchronous¬ 
ly  and  possibly  with  some  variable  delay) .  Let  us  as- 
sirnie  that  the  time  between  consecutive  broadcasts  plus 
the  communication  delay  until  the  broadcasted  messages 
reach  all  nodes  is  bounded  by  some  T.  It_follows  that 
at  time  n  each  node  k  knows  the  value  of  F^3  (m^^)  ,  for 
some  m^  with  n-T<m^^n.  Combining  this  observation  with 
(3.12)  we  conclude  that  at  time  n,  each  node  k  knows 
an  estimate  fJ3 (n)  satisfying 

n 

Fr’ (n)  =  Z  d,-^^  (n:m)F^^  (m)  (3.13) 

k  k 

m=n~^ 


where  C=^-K5  and  Cnrm)  are  funknown)  nonnegative  co¬ 
efficients  summing  to  one,  for  fixed  n. 


For  each  OD  pair  w,  the  corresponding  origin  node 
(let  us  denote  it  by  k)  uses  the  values  of  (n)  to 

9d 

form  an  estimate  X  (n)  of  (x  (n) )  as  follows.  Note 
^  w  ox 

that  w 


9D 


8x 


-(x{n) )  =< 


w,p 


— ^  (F^^(n)),  if  (i,j)ep 

3F  J 


0, 


otherwise. 


(3.14) 


Accordingly,  a  natural  estimate,  is  given  (componentwise) 
by: 


It  seems  realistic  to  assume  that  if  x  (k)  is  held 
_  w,p 

constant ,  £ay  equal  to  x ,  the  actual  flows  x (n)  should 
settle  to  X  at  a  geometric  rate.  Accordingly: 


A 

w  ,p 


(n) 


(i,j)ep  ^ 


(n)) 


(3.15) 


Assumption:  There  exist  constant  ,  6€[0,1)  such  that 

a  (n;k)  <  B  Vn,k,w,pGP  .  (3.10) 

w,p  “  w 

Concerning  the  computation  of  the  desired  flows  we 
postulate  an  update  rule  of  the  form  (cf.  (3.7)). 

X  (n+1)  =  [x  (n)-Yy  X  (n)]'*'.  (3.11) 

w  w  w  w 

3d 

Here  A  (n)  is  some  estimate  of  x —  (x (n) )  which  is ,  in 
w  dx 

general,  inexact  due  to  asynohronism  and  delays  in  ob¬ 
taining  measuranents.  However,  it  would  be  unnatural 
to  assume  that  the  computation  (3.11)  is  carried  out  at 
each  time  instance  for  each  OD  pair.  We  therefore 
define  a  set  T  of  times_for  whioh_(3.11)  is  used.  For 
all  n0T„,  we  simply  let  x  (n+1)  =  x  (n)  .  We  only 
assume  that  the  time  between  consecutive  updates 
(equivalently,  the  difference  of  consecutive  elements 

of  T  )  is  bounded,  for  each  w. 
w 


The  development  of  our  model  is  now  complete.  To 
summarize,  the  basic  equation  is  (3.11),  where  x(n)  is 
determined  by  (3.9),  A  (n)  is  determined  by  (3.15), 

''ii  ^  '  ' 

Fj_-^  (n)  is  given  by  (3.13)  and  F^l  is  related  to  x  by 
(3.3)  . 

Let  us  close  this  section  with  a  remark.  A  distri¬ 
buted  version  of  the  Bellman  algorithm  for  shortest 
paths  has  been  shown  to  converge  appropriately  [8] , 

[9]  even  if  the  time  between  consecutive  broadcasts 
is  unbounded.  In  our  model  however,  we  have  to  assime 
boundedness  because  otherwise  there  are  examples  that 
demonstrate  that  convergence  is  not  guaranteed.  Of 
course  such  an  assumption  is  always  observed  in  practice 

A  simple  example  is  the  following:  consider  the 
network  of  Figure  1.  There  are  three  origin  nodes  (node 
1,2,  and  3) ,  with  input  arrival  rate  equal  to  1  at  each 
one  of  them,  and  a  single  destination  node  (node  6) . 

For  each  OD  pair  there  are  two  paths.  For  each  origin 


(4.4) 


node  i,  let  x.  denote  the  flow  routed  through  the  path 
containing  noSe  4.  _^t  D^3(Fi3)  =  for  (i,j)  = 

(4,6)  or  {5<6)  and  0^3  (piD)  =  o  for  all  other  links. 

In  terms  of  the  variables  x^,  x^,  x^,  the  cost  becomes 

D(x^,X2,X2)  =  (x^+X2+X2)^+  O-x^-x^-x^)  (3.16) 


We  assume  that  the  settling  time  is  zero,  so  that  we  do 
not  need  to  distinguish  between  actual  and  desired 
flows,  and  that  each  node  i  (i=l,2,3)  knows  x.  exactly 
and  is  able  to  transmit  its  value  instantaneously  to 
the  remaining  origin  nodes.  Suppose  that  initially 
Xj^=X2=>'3=1  and  that  each  origin  node  executes  a  large 
number  of  gradient  projection  iterations  with  a  small 
stepsize  before  communicating  the  current  value  of  x. 
to  the  other  nodes.  Then,  effectively,  node  i  solves 
the  problem 

min  {(x.+2)^  +  (1-x.)^}  , 

0<x.<l  ^  ^ 

—  1— 

thereby  obtaining  the  value  x.=0.  At  that  point  the 
processors  broadcast  their  current  values  of  x..  If 
this  sequence  of  events  is  repeated,  .each  x,  will  be¬ 
come  again  equal  to  1.  So,  (.x  oscillates 

between  (0,0,0)  and  (1,1,1)  without  ever  converging  to 
an  optimal  routing.  The  same  behavior  is  also  observed 
if  the  cost  function  (3.16)'  is  modified  by  adding  a 
term  E {xj  +  x“  +  x“) ,  which  makes  it  strictly  convex, 
as  long  as  0<S<<1. 


IV.  RESULT  AMD  CONVERGENCE  PROOF 

Theorem :  With  the  algorithm  and  the  assumptions  intro¬ 
duced  in  the  last  section  and  provided  that  the  step- 
size  Y  is  chosen  small  enough,  D(x(n))  converges  to 
min  D (x)  and  any  limit  point  of  {x(n)}  is  a  minimizing 
xSG  _ 

point.  Moreover,  x(n)-x(n)  converges  to  zero.  Finally, 
if  each  is  strictly  convex  (as  a  function  of  the 
link  flow  Fii)  and  if,  for  each  OD  pair  w=(i,j) ,  P 
cqntains  all  paths  from  i  to  j ,  then  the  link  flows 


(n) 

converge 

to  their 

(unique) 

optimal  values. 

Lemma: 

Let 

denote  projection 

on  a  convex  set 

G  3r", 

Assume 

that  OSG. 

Then , 

+  j  1 

+ 1  1  2 

n 

<a , 

^  2.  i  1 

[a]  II  , 

Va  €  IR  . 

(4 

.1) 

Proof: 

If  aeG, 

[a]"''  =  a 

and  (4.1) 

holds  trivially. 

So, 

let  us  assume  that  [a]'*'  g  G  and  form  a  triangle  with 
vertices  at  the  points  a,  [a]'*'  and  the  origin,  denoted 
by  A,B,0,  respectively  (see  Figure  2).  Let  G„  be  the 
intersection  of  G  with  the  plane  defined  by  that  tri¬ 
angle.  Let  us  draw  the  normal  to  AB  through  point  B. 
This  line  is  a  supporting  hyperplane  for  Gg.  Therefore, 
0  and  A  lie  at  different  sides  of  that  line;  hence  the 
angle  OBA  is  larger  than  90  degrees.  Let  us  now  draw 
the  normal  to  OB  through  B.  It  must  intersect  the 
segment  OA  at  some  point  C,  because,  <  OBA  >  90-  . 

Hence , 

=  I|ob||^  =  <OB,OC>  <  <0B,OA>  =<a,[a]'*'>. 


By  translating  the  origin  to  an  arbitrary  point  x,  (4.1) 
becomes : 

<a,  [x+a]''’-x>  ^  I  I  [x+a]'*’-x|  I  ^ ,  x€G,  a€lR^.  (4.2) 

Proof  of  the  Theorem:  We  define  s (n)  to  be  the  vector 
with  components 

[■^  (n)-YM  X  (n)]**'  -  X  (n)  ,  nST  , 
w  w  w  w  w 


s 

w 


(n)  = 


n^Tw 


so  that 

X  (n+1)  =  X  (n)  +  s  (n)  . 
w  w  w 

Using  (4.2)  with  a  =  -y\i  X  (n)  ,  we  obtain 
w  w 


Ms^(n)  I  l^/YU^  .  (4.5) 

Using  (4.4),  (3.9)  and  the  assumption  (3.10),  it  is 
easy  to  show  that  for  some  A^^O  (independent  of  Y  ot  n) 


n-1 


I  |x(n)  -  x(n)  I  I  <  A  Z  |s(n)  I  |.  (4.6) 

k=l 


Furthermore,  comparing  (3.14)  to  (3.15)  and  using 
the  Lipschitz  continuity  of  9Dij/3F^^,  we  conclude  that 
for  some  constants  A^ , . . . ,A  (independent  of  y) 


I  3d 

lax 


(x(n) ) 


^^3 


1^4 


^^4 


<  K 


< 


-  A^(n) 

1  1  <  A^  max  |f^^  (n)-F^^  (n) 

1  1 

/X 

max  max  |  (m) (n)  |  < 

i,i  n-C<m<n 

max  1  1 

x(m)-x(n)  1  1  < 

n-C<m<n 

max  { 1 

|x(m)-x(m)||  +  ||x(m)-x(n) 

11  + 

n-C<m<n 

+  1  |x(n)-x(n) 

ll>  1 

n-1 

n-1 

Z  6"  "^1 

1  s  (k)  1  1  +  Ag  Z  1  1  s  (m)  1  1 

< 

k=l 

m=n-C 

n-1 

z  s"~'^| 

|s(k)||  . 

(4 

k=l 

(The  second  inequality  follows  from  (3.13)  ,  the  third 
from  (3.3),  the  fourth  is  the  triangle  inequality,  the 
fifth  uses  (4.6),)  Using  Lipschitz  continuity  once 
more,  (4.6)  and  (4.7)  we  finally  obtain,  for  some 
Aj>0  (independent  of  n,Y)  , 


B"  I  s  (k)  i  I  . 


(4.8) 


Using  a  first  order  series  expansion  for  D,  we 

have 


D(x(n+1))  <  D(x(n))  +  S  < (x  (n) )  ,  s  (n)  >+A  |  |  s  (n)  |  |  ^  £ 
w  w  w  g  — 


<  D(x(n))+Z:<A^{n)  ,s  (n)>  +  A  Z  B"~^|  |  s  (k)  |  |  i|s(n)||  + 

w  °  k=l 

+  Ag  i|s(n)||2  < 

<  D(x(n))  -  ^  ||s(n)||2  +  a  Z  |  s  (k)  |  |  ^  . 

k=l 


(Here,  the  second  inequality  was  obtained  from  (4.8);  the 
third  from  (4.5).)  Summing  (4.9)  for  different  values 


0 


(4.3) 


s  (k) 


2 


of  n  and  rearranging  terns  we  obtain 

k 

D(x(n+1))  <  D(xCl)) 


n 

k=l 


10 

Y 


11 


n 

i=k 


-n-k 


Suppose  that  y  is  small  enough  so  that 


10 


11 


Y  1- 

Note  that  D  is  continuous  on  a  compact  set ,  hence 
bounded  below.  Let  rrx”  in  (4.10)  to  obtain 


(4.10) 


>  0. 


For  certain  special  choices  of  the  cost  function 
and  under  certain  asstmptions,  the  partial  derivative 

equals  the  average  delay  of  a  packet  traveling 
through  link  (i,j).  In  that  case,  it  is  very  natural 
to  assume  that  this  derivative  may  be  measured  directly, 
without  first  measuring  the  flow  F^i .  Our  result  may 
be  easily  shown  to  be  valid  for  this  class  of  algorithms 
as  well. 
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00 

Z  I  |s(k)  I  1^  <  «>,  (4.11) 

k=l 

and,  in  particular,  lim  |  |x  (k) -x(k)  |  |  lit®  (  |  s  (k)  1 1  = 

0.  Let  k+oo  k-»> 

f  (x)  =  [x  -yy  ~  (x)]"^  -  X  .  (4.12) 

w  w  w  dx  w 

w 

Using  (4.7)  we  conclude  that  ||X^(n)  -  (x(n))|| 

converges  to  zero.  Hence,  ||f^(x{n))  -  s^(n)  |  |  con¬ 
verges  to  zero  along  any  integer  sequence  contained  in 

T  . 

w 

Let  x*  be  a  limit  point  of  {x(n)}.  (At  least  one 
exists  because  G  is  compact.)  Since  |  j s  (n)  | |  ^  0  and 
since  the  difference  of  consecutive  elements  of  is 
bounded,  x*  is  also  a  limit  point  of  {x{n):neT^}.  Pick 
a  subsequence  {x(n]^)},  n]^ST„  which  converges  to  x*. 

Since  s^(n]j)  0,  we  also  have  f„  (x(nk)  )j^0. 

Clearly,  f^,  is  continuous,  which  implies  that  f^(x*)=0. 
Since  this  is  true  for  all  w,  the  sufficient  conditions 
for  optimality  are  satisfied  at  x^.  Finally,  using 
(4.10)  ,  it  is  easy  to  see  that  D(x(n))  ,  and  hence 
D(x{n))  ,  converge;  the  limit  can  only  be  the  optimal 
value  of  D. 

The  last  part  of  the  theorem  is  trivial. 

V.  CONCLUSIONS 

Gradient  projection  algorithms  for  routing  in  a 
data  network  converge  appropriately  even  in  the  face 
of  siibstantial  asynchronism  and  even  if  the  time  re¬ 
quired  for  the  network  to  adjust  to  a  change  in  the 
routing  policies  (settling  time)  is  non-negligible. 

While  convergence  is  proved  under  the  assumption  that 
the  input  arrival  rates  r  are  constant,  it  is  expected 
that  the  algorithm  will  be  able  to  adjust  appropriately 
in  the  face  of  small  variations.  If  input  variations 
become  substantial,  however,  and  the  quasistatic 
assumption  is  violated,  a  more  detailed  analysis  is 
required,  incorporating  stochastic  effects. 

Another  idealization  in  our  model  arises  in  the 
measurement  equation  (3.12)  ,  which  assumes  that  measure¬ 
ments  are  noiseless.  This  is  a  reasonable  assumption 
if  the  time  average  runs  over  a  sufficiently  long  period 
but  may  be  unrealistic  otherwise ,  necessitating  again 
again  a  more  elaborate  stochastic  model. 
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Finally,  let  us  mention  an  important  related  class 
of  distributed  algorithms.  In  the  present  model  the 
nodes  measure  and  broadcast  messages  with  their  esti¬ 
mates  of  the  link  flows  F^3 .  Other  nodes  receive  the 
broadcasted  messages  and  use  them  to  compute  estimates 

9d^^  ii 

of  the  expression  - rv  (F  ^)  which  is  required  in  the 

3F^^ 


algorithm, 
say  node  j 
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3f' 
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An  alternative  possibility  would  be  to  let, 
,  to  measure  directly  or  compute  the  value  of 

and  broadcast  tliat  value  to  the  other  nodes. 
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